Using the package ggplot2

Elements of a plot

Additional components

Why use a grammar of graphics?

Variable in the data is directly mapped to an element in the plot

Data - Autism

glimpse(autism)
# Observations: 604
# Variables: 7
# $ childid  <int> 1, 1, 1, 1, 1, 10, 10, 10, 10, 100, 100, 100, 100, 10...
# $ sicdegp  <fctr> high, high, high, high, high, low, low, low, low, hi...
# $ age2     <dbl> 0, 1, 3, 7, 11, 0, 1, 7, 11, 0, 1, 3, 7, 0, 1, 7, 11,...
# $ vsae     <int> 6, 7, 18, 25, 27, 9, 11, 18, 39, 15, 24, 37, 135, 8, ...
# $ gender   <fctr> male, male, male, male, male, male, male, male, male...
# $ race     <fctr> white, white, white, white, white, white, white, whi...
# $ bestest2 <fctr> pdd, pdd, pdd, pdd, pdd, autism, autism, autism, aut...

Plotting points

ggplot(autism, aes(x=age2, y=vsae)) + 
  geom_point()

Your turn

How is the data mapped to graphical elements?

Jittering points

ggplot(autism, aes(x=age2, y=vsae)) + 
  geom_jitter()

Your turn

How is the data mapped to graphical elements?

Adding lines

ggplot(autism, aes(x=age2, y=vsae)) + 
  geom_point() + geom_line()

Not the lines we want

ggplot(autism, aes(x=age2, y=vsae, group=childid)) + 
  geom_point() + geom_line()

Too much ink

ggplot(autism, aes(x=age2, y=vsae, group=childid)) + 
  geom_point() + geom_line(alpha=0.5)

ggplot(autism, aes(x=age2, y=vsae, group=childid)) + 
  geom_line(alpha=0.2) + theme_bw()

Now we can see that some individuals degrade, while most improve with time.

Log scale y?

ggplot(autism, aes(x=age2, y=vsae, group=childid)) + 
  geom_line(alpha=0.2) + scale_y_log10() + theme_bw()

By age 2 diagnosis

ggplot(autism, aes(x=age2, y=vsae, group=childid, colour=bestest2)) + 
  geom_line(alpha=0.3) + scale_y_log10() + theme_bw()

Now we can see a lot of overlap between the two groups.

Refine groups

ggplot(autism, aes(x=age2, y=vsae, colour=bestest2)) + 
  geom_point(alpha=0.1) + geom_line(aes(group=childid), alpha=0.1) + 
  geom_smooth(se=F) + scale_y_log10() + theme_bw()

ggplot(autism, aes(x=age2, y=vsae, colour=bestest2)) + 
  geom_point(alpha=0.1) + geom_line(aes(group=childid), alpha=0.1) + 
  geom_smooth(se=F, method=lm) + scale_y_log10() + theme_bw()

Your turn

What do we learn about autism, age, and the diagnosis at age 2?

In terms of categorisation into either pdd or autism the vsae score is not distinct, but on average the autism diagnosis 2 year olds have lower scores. There a lot of overlap between ths group.

Your turn

How is the data mapped to graphical elements?

A different look

ggplot(autism, aes(x=age2, y=vsae, colour=bestest2)) + 
  geom_boxplot() + scale_y_log10()

That’s not what I wanted ….

For each age measured

ggplot(autism, aes(x=factor(age2), y=vsae, colour=bestest2)) + 
  geom_boxplot() + scale_y_log10()

Which is better?

p1 <- ggplot(autism, aes(x=age2, y=vsae, colour=bestest2)) + 
  geom_point(alpha=0.1) + geom_line(aes(group=childid), alpha=0.1) + 
  geom_smooth(se=F) + scale_y_log10() + theme(legend.position="none")
p2 <- ggplot(autism, aes(x=factor(age2), y=vsae, colour=bestest2)) + 
  geom_boxplot() + scale_y_log10() + theme(legend.position="none")
grid.arrange(p1, p2, ncol=2)

New example - Flying etiquette

41% Of Fliers Think You’re Rude If You Recline Your Seat

fly <- read_csv("./data/flying-etiquette.csv")
glimpse(fly)
# Observations: 1,040
# Variables: 27
# $ RespondentID                                                                                                                             <dbl> ...
# $ How often do you travel by plane?                                                                                                        <chr> ...
# $ Do you ever recline your seat when you fly?                                                                                              <chr> ...
# $ How tall are you?                                                                                                                        <int> ...
# $ Do you have any children under 18?                                                                                                       <chr> ...
# $ In a row of three seats, who should get to use the two arm rests?                                                                        <chr> ...
# $ In a row of two seats, who should get to use the middle arm rest?                                                                        <chr> ...
# $ Who should have control over the window shade?                                                                                           <chr> ...
# $ Is itrude to move to an unsold seat on a plane?                                                                                          <chr> ...
# $ Generally speaking, is it rude to say more than a few words tothe stranger sitting next to you on a plane?                               <chr> ...
# $ On a 6 hour flight from NYC to LA, how many times is it acceptable to get up if you're not in an aisle seat?                             <chr> ...
# $ Under normal circumstances, does a person who reclines their seat during a flight have any obligation to the person sitting behind them? <chr> ...
# $ Is itrude to recline your seat on a plane?                                                                                               <chr> ...
# $ Given the opportunity, would you eliminate the possibility of reclining seats on planes entirely?                                        <chr> ...
# $ Is it rude to ask someone to switch seats with you in order to be closer to friends?                                                     <chr> ...
# $ Is itrude to ask someone to switch seats with you in order to be closer to family?                                                       <chr> ...
# $ Is it rude to wake a passenger up if you are trying to go to the bathroom?                                                               <chr> ...
# $ Is itrude to wake a passenger up if you are trying to walk around?                                                                       <chr> ...
# $ In general, is itrude to bring a baby on a plane?                                                                                        <chr> ...
# $ In general, is it rude to knowingly bring unruly children on a plane?                                                                    <chr> ...
# $ Have you ever used personal electronics during take off or landing in violation of a flight attendant's direction?                       <chr> ...
# $ Have you ever smoked a cigarette in an airplane bathroom when it was against the rules?                                                  <chr> ...
# $ Gender                                                                                                                                   <chr> ...
# $ Age                                                                                                                                      <chr> ...
# $ Household Income                                                                                                                         <chr> ...
# $ Education                                                                                                                                <chr> ...
# $ Location (Census Region)                                                                                                                 <chr> ...

Variables

Mix of categorical and quantiative variables. What mappings are appropriate? Area for counts of categories, side-by-side boxplots for mixed pair.

Support

ggplot(fly, aes(x=`How often do you travel by plane?`)) + 
  geom_bar() + coord_flip()

Categories are not sorted

Sorted categories

fly$`How often do you travel by plane?` <- 
  factor(fly$`How often do you travel by plane?`, levels=c(
    "Never","Once a year or less","Once a month or less",
    "A few times per month","A few times per week","Every day"))
ggplot(fly, aes(x=`How often do you travel by plane?`)) +
  geom_bar() + coord_flip()

Filter data

fly_sub <- fly %>% filter(`How often do you travel by plane?` %in% 
                            c("Once a year or less",
                              "Once a month or less")) %>%
  filter(!is.na(`Do you ever recline your seat when you fly?`)) %>%
  filter(!is.na(Age)) %>% filter(!is.na(Gender))

Recline by height

fly_sub$`Do you ever recline your seat when you fly?` %>% unique()
# [1] "About half the time" "Usually"             "Always"             
# [4] "Once in a while"     "Never"

fly_sub$`Do you ever recline your seat when you fly?` <- factor(
  fly_sub$`Do you ever recline your seat when you fly?`, levels=c(
    "Never","Once in a while","About half the time",
    "Usually","Always"))
ggplot(fly_sub, aes(y=`How tall are you?`, 
                    x=`Do you ever recline your seat when you fly?`)) + 
                      geom_boxplot() #+ coord_flip()

Cheat sheet

Take a look at the ggplot2 Cheat sheet

Your turn

How many geoms are available in ggplot2? What is geom_rug?

p <- ggplot(autism, aes(x=age2, y=vsae))
p1 <- p + geom_point() + coord_flip()
p2 <- p + geom_point() + geom_rug() + coord_flip()
p3 <- p + geom_point() + geom_rug(position='jitter') + coord_flip()
grid.arrange(p1, p2, p3, nrow=3)

Your turn

What is the difference between colour and fill?

Colour is for 0 or 1-dimensional elements, and fill is for area (2-d) geoms

Your turn

What does coord_fixed() do? What is the difference between this and using theme(aspect.ratio=...)?

p <- ggplot(autism, aes(x=age2, y=vsae))
p1 <- p + geom_point() + coord_fixed(ratio = 1)
p2 <- p + geom_point() + theme(aspect.ratio = 1)
grid.arrange(p1, p2, ncol=2)

coord_fixed operates on the raw data values, but theme(aspect_ratio=...) works on the plot dimensions.

Your turn

What are scales? How many numeric transformation scales are there?

scales do the transformation between data values and graphical element value. most often it is applied to position along x, y which is common, to log or sqrt, .. there are 3 numeric transformations.

Your turn

What are position adjustments? When would they be used?

positions shift the location some from original coordinates. most often used with bar charts to stack, or put side-by-side

Your turn

Use your cheat sheet to work out how to make a plot to explore the relationship between

Do you ever recline your seat when you fly? and Is it rude to recline your seat on a plane?

unique(fly_sub$`Is itrude to recline your seat on a plane?`)
# [1] "Yes, somewhat rude"  "No, not rude at all" "Yes, very rude"
unique(fly_sub$`Do you ever recline your seat when you fly?`)
# [1] About half the time Usually             Always             
# [4] Once in a while     Never              
# Levels: Never Once in a while About half the time Usually Always
ggplot(fly_sub, aes(x=`Do you ever recline your seat when you fly?`)) +
  geom_bar() + 
  facet_wrap(~`Is itrude to recline your seat on a plane?`, ncol=3) +
  coord_flip()

ggplot(fly_sub, aes(x=`Do you ever recline your seat when you fly?`,
                    fill=`Is itrude to recline your seat on a plane?`)) +
  geom_bar()

ggplot(fly_sub, aes(x=`Do you ever recline your seat when you fly?`,
                    fill=`Is itrude to recline your seat on a plane?`)) +
  geom_bar(position="dodge")

Facets

ggplot(fly_sub, 
       aes(x=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar() + coord_flip() + facet_wrap(~Gender)

Facets

fly_sub$Age <- factor(fly_sub$Age, 
                      levels=c("18-29","30-44","45-60","> 60"))
ggplot(fly_sub, 
       aes(x=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar() + coord_flip() + facet_grid(Age~Gender)

Color palettes - default

p <- ggplot(fly_sub, 
            aes(x=`In general, is itrude to bring a baby on a plane?`,
                fill=Gender)) + 
  geom_bar(position="fill") + coord_flip() + 
  facet_wrap(~Age, ncol=5)
p

Color palettes - brewer

p + scale_fill_brewer(palette="Dark2") 

Color blind-proofing

What it looks like to a color-blind:

library(scales)
library(dichromat)
p1 <- p + theme(legend.position = "none")
clrs <- hue_pal()(3)
clrs <- dichromat(clrs)
p2 <- p + scale_fill_manual("", values=clrs) + 
  theme(legend.position = "none")
grid.arrange(p1, p2)

Perceptual principles

Hierarchy of mappings

  1. Position - common scale (BEST)
  2. Position - nonaligned scale
  3. Length, direction, angle
  4. Area
  5. Volume, curvature
  6. Shading, color (WORST)

Pre-attentive

Can you find the odd one out?

df <- data.frame(x=runif(100), y=runif(100), 
                 cl=sample(c(rep("A", 1), rep("B", 99))))
ggplot(data=df, aes(x, y, shape=cl)) + theme_bw() + 
  geom_point() + theme(legend.position="None", aspect.ratio=1)


Is it easier now?

ggplot(data=df, aes(x, y, colour=cl)) + 
  geom_point() + theme_bw() + 
  theme(legend.position="None", aspect.ratio=1)

Color palettes


library(RColorBrewer)
display.brewer.all()

Proximity

ggplot(fly_sub, aes(x=`In general, is itrude to bring a baby on a plane?`,
                    fill=Gender)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5)

With this arrangement we can see proportion of gender within each rudeness category, and compare these across age groups. How could we arrange this differently?

Proximity

ggplot(fly_sub, aes(x=Gender, 
                    fill=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) +
  theme(legend.position="bottom")

ggplot(fly_sub, aes(x=Gender,
                    fill=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) + theme(legend.position="bottom")

What is different about the comparison now?

Another arrangement

ggplot(fly_sub, aes(x=Age,
                    fill=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Gender, ncol=5) + 
  theme(legend.position="bottom")

Themes

The ggthemes package has many different styles for the plots. Other packages such as xkcd, skittles, wes anderson, beyonce, ….

ggplot(fly_sub, aes(x=Gender,
                    fill=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) +
  theme_xkcd() + theme(legend.position="bottom")

See the vignette for instructions on installing the xkcd font.

ggplot(fly_sub, aes(x=Gender,
                    fill=`In general, is itrude to bring a baby on a plane?`)) + 
  geom_bar(position="fill") + coord_flip() + facet_wrap(~Age, ncol=5) +
  theme_xkcd() + theme(legend.position="bottom")

Resources

Share and share alike

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.